Skip to content

fix: apply the left side schema on the right side in set expressions#21052

Merged
jonahgao merged 1 commit intoapache:mainfrom
splitgraph:fix-set-expr-via-planner-context
Apr 9, 2026
Merged

fix: apply the left side schema on the right side in set expressions#21052
jonahgao merged 1 commit intoapache:mainfrom
splitgraph:fix-set-expr-via-planner-context

Conversation

@gruuya
Copy link
Copy Markdown
Contributor

@gruuya gruuya commented Mar 19, 2026

Which issue does this PR close?

Rationale for this change

DataFusion requires all projected expressions to have unique names during planning, so it doesn't support select 0, 0 for instance.

However this shouldn't be an issue when this is just a sub-SELECT in a larger query which does abide by this rule. For example a set expression (UNION, EXCEPT, INTERSECT) query should only require the first SELECT to provide a unique schema, and that should be sufficient.

Furthermore, this requirement is even more redundant, since all field name/aliases other than those in the first SELECT are discarded anyway.

What changes are included in this PR?

  • when we're processing a set expression (UNION, EXCEPT, INTERSECT), save the left side schema to planner context
  • when we're inside SqlToRel::select_to_plan pop the schema and pass it down to
  • a new project_with_validation_and_schema function in LogicalPlanBuilder to properly alias them

The benefit of this approach compared to #20819 is that wildcards are unwrapped and we can properly handle them as well.

The downside is that we need to thread the left schema via the planner context now.

Are these changes tested?

Yes, there are unit tests and SLTs.

Are there any user-facing changes?

New method in LogicalPlanBuilder called project_with_validation_and_schema which will alias the projection with the provided schema.

@github-actions github-actions bot added sql SQL Planner logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt) labels Mar 19, 2026
@gruuya gruuya force-pushed the fix-set-expr-via-planner-context branch 2 times, most recently from 706f8f8 to 6669e8b Compare March 19, 2026 13:54
Copy link
Copy Markdown
Member

@jonahgao jonahgao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a minor suggestion; the overall LGTM. Thank you @gruuya

if let Some(schema) = &schema {
for (expr, field) in projected_expr.iter_mut().zip(schema.fields()) {
if !matches!(expr, Expr::Column(_) | Expr::Alias(_)) {
*expr = expr.clone().alias(field.name());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cloning an expression is typically expensive. How about

projected_expr = projected_expr
            .into_iter()
            .zip(schema.fields().iter())
            .map(|(expr, field)| {
                if !matches!(expr, Expr::Column(_) | Expr::Alias(_)) {
                    expr.alias(field.name())
                } else {
                    expr
                }
            })
            .collect();

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point, I replaced the clone with take, so we just replace the expr in-place

diff --git a/datafusion/expr/src/logical_plan/builder.rs b/datafusion/expr/src/logical_plan/builder.rs
index 1d257c775..188daa724 100644
--- a/datafusion/expr/src/logical_plan/builder.rs
+++ b/datafusion/expr/src/logical_plan/builder.rs
@@ -2017,7 +2017,7 @@ fn project_with_validation(
     if let Some(schema) = &schema {
         for (expr, field) in projected_expr.iter_mut().zip(schema.fields()) {
             if !matches!(expr, Expr::Column(_) | Expr::Alias(_)) {
-                *expr = expr.clone().alias(field.name());
+                *expr = std::mem::take(expr).alias(field.name());
             }
         }
     }

@gruuya gruuya force-pushed the fix-set-expr-via-planner-context branch from 6669e8b to afab18e Compare April 9, 2026 08:52
@jonahgao jonahgao added this pull request to the merge queue Apr 9, 2026
Merged via the queue into apache:main with commit 8cf70ec Apr 9, 2026
35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

logical-expr Logical plan and expressions sql SQL Planner sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unexpected set expression query planning error

2 participants